Skip to content

Use git_odb_exists instead of git_revparse_single for ObjectExists#2006

Open
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/odb-exists
Open

Use git_odb_exists instead of git_revparse_single for ObjectExists#2006
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/odb-exists

Conversation

@tyrielv
Copy link
Copy Markdown
Contributor

@tyrielv tyrielv commented Jun 4, 2026

Summary

Replaces the heavyweight git_revparse_single call in LibGit2Repo.ObjectExists with git_odb_exists, a purpose-built existence check that skips revparse expression parsing and git_object handle allocation.

Benchmark (os.2020 enlistment, 59.7M objects, 14 packs)

Scenario Before (revparse) After (odb_exists) Speedup
Existing objects ~800 ns/op ~800 ns/op ~1x
Missing objects 2,800,000 ns/op 1,320,000 ns/op 2.1x

Implementation

  • Add P/Invoke bindings for git_repository_odb, git_odb_exists, git_odb_free, git_oid_fromstr
  • ODB handle lazily acquired on first ObjectExists call, freed in Dispose
  • Falls back to git_revparse_single if ODB handle acquisition fails

Semantic split: ObjectExists vs ObjectCanBeParsed

git_odb_exists only checks whether an OID is present in the object store -- it does not decompress or validate the object. This changes behavior for LooseObjectsStep.ClearCorruptLooseObjects, which relied on ObjectExists returning false for corrupt-but-present loose objects to trigger cleanup.

To preserve that behavior, this PR introduces ObjectCanBeParsed(sha) which retains the old git_revparse_single path for corruption detection. LooseObjectsStep is updated to call ObjectCanBeParsed instead of ObjectExists.

Method Implementation Use case
ObjectExists git_odb_exists (fast) Existence checks -- prefetch, diff walk, commit verification
ObjectCanBeParsed git_revparse_single (slow but validates) Corruption detection -- loose object cleanup

Impact

All ObjectExists callers benefit automatically:

  • FindBlobsStage (blob prefetch)
  • EnumerateMissingTreeEntries (diff/tree walk)
  • CommitAndRootTreeExists (commit verification)

LooseObjectsStep preserves existing corruption-detection semantics via ObjectCanBeParsed.

Files changed

  • LibGit2Repo.cs -- new P/Invokes, ObjectExists uses git_odb_exists, new ObjectCanBeParsed
  • GitRepo.cs -- expose ObjectCanBeParsed through the invoker
  • LooseObjectsStep.cs -- call ObjectCanBeParsed instead of ObjectExists

@tyrielv tyrielv force-pushed the tyrielv/odb-exists branch from 8d6d6dd to 40cce5f Compare June 4, 2026 19:28
@tyrielv tyrielv marked this pull request as ready for review June 4, 2026 20:25
Replace the heavyweight git_revparse_single call in LibGit2Repo.ObjectExists
with git_odb_exists, a purpose-built existence check that skips revparse
expression parsing and git_object handle allocation.

Benchmarked on an os.2020 enlistment (59.7M objects, 14 packs):
- Existing objects: ~800 ns/op (comparable)
- Missing objects: 1.3ms vs 2.8ms (2.1x faster)

The ODB handle is lazily acquired on first ObjectExists call via
git_repository_odb (returns the repo's internal ODB, ref-counted)
and freed in Dispose. Falls back to revparse if ODB acquisition fails.

Add ObjectCanBeParsed method that retains the old revparse behavior
for callers that need corruption detection (LooseObjectsStep), since
git_odb_exists only checks index presence, not object integrity.

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
@tyrielv tyrielv force-pushed the tyrielv/odb-exists branch from 40cce5f to 2f9a2b1 Compare June 5, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant